Tree-based Multivariate Regression and Density Estimation with Right-censored Data

نویسندگان

  • Annette M. Molinaro
  • Sandrine Dudoit
  • Mark J. van der Laan
چکیده

We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) First, define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator of this risk. (2) Next, construct candidate estimators based on the loss function for the observed data. (3) Then, apply cross-validation to estimate risk based on the observed data loss function and to select an optimal estimator among the candidates. A number of common estimation procedures follow this approach in the full data situation, but depart from it when faced with the obstacle of evaluating the loss function for censored observations. Here, we argue that one can, and should, also adhere to this estimation road map in censored data situations. Tree-based methods, where the candidate estimators in Step 2 are generated by recursive binary partitioning of a suitably defined covariate space, provide a striking example of the chasm between estimation procedures for full data and censored data (e.g., regression trees as in CART for uncensored data and adaptations to censored data). Common approaches for regression trees bypass the risk estimation problem for censored outcomes by altering the node splitting and tree pruning criteria in manners that are specific to right-censored data. This article describes an application of our unified methodology to tree-based estimation with censored data. The approach encompasses univariate outcome prediction, multivariate outcome prediction, and density estimation, simply by defining a suitable loss function for each of these problems. The proposed method for tree-based estimation with censoring is evaluated using a simulation study and the analysis of CGH copy number and survival data from breast cancer patients.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of three Estimation Procedures for Weibull Distribution based on Progressive Type II Right Censored Data

In this paper, based on the progressive type II right censored data, we consider estimates of MLE and AMLE of scale and shape parameters of weibull distribution. Also a new type of parameter estimation, named inverse estimation, is introdued for both shape and scale parameters of weibull distribution which is used from order statistics properties in it. We use simulations and study the biases a...

متن کامل

Nonparametric estimation of the multivariate distribution function in a censored regression model with applications

In a regression model with univariate censored responses, a new estimator of the joint distribution function of the covariates and response is proposed, under the assumption that the response and the censoring variable are independent conditionally to the covariates. This estimator is based on the conditional Kaplan-Meier estimator of Beran (1981), and happens to be an extension of the multivar...

متن کامل

Linear Wavelet-Based Estimation for Derivative of a Density under Random Censorship

In this paper we consider estimation of the derivative of a density based on wavelets methods using randomly right censored data. We extend the results regarding the asymptotic convergence rates due to Prakasa Rao (1996) and Chaubey et al. (2008) under random censorship model. Our treatment is facilitated by results of Stute (1995) and Li (2003) that enable us in demonstrating that the same con...

متن کامل

Testing additivity in nonparametric regression under random censorship

In this paper, we are concerned with nonparametric estimation of the multivariate regression function in the presence of right censored data. More precisely, we propose a statistic that is shown to be asymptotically normally distributed under the additive assumption, and that could be used to test for additivity in the censored regression setting.

متن کامل

Logspline Density Estimation under Censoring and Truncation

In this paper we consider logspline density estimation for data that may be lefttruncated or right-censored. For randomly left-truncated and right-censored data the product-limit estimator is known to be a consistent estimator of the survivor function, having a faster rate of convergence than many density estimators. The product-limit estimator and B-splines are used to construct the logspline ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004